**Backtest:

We use a long-short equally-weighted portfolio backtest for all our models. Other papers are tended to use the top 10 long predictions and top 10 short predictions to build a portfolio but we prefer to include all predictions in our portfolio. We assume that we are able to buy at market open and liquidate at market close. We don’t take into account transaction costs and slippage as we did not have the adequate resources.

**Proof-of-Concept 1: Reuters dataset 1 2017-2020

Data is segmented into training data (2017-2018) and test data (2019-2020). Preprocessing of the text data for text normalization, stemming, lemmatization and extraction of stop words.

*Model accuracy on the validation dataset:

NTLK VADER Sentiment Analyzer - N/A Linear Classifier - 53% Sentimetre Model 1 - 53% Sentimetre Model 2 - 57%

*Prediction accuracy on the test dataset:

NTLK VADER Sentiment Analyzer - 50% Linear Classifier - 52% Sentimetre Model 1 - 51% Sentimetre Model 2 - 55%

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:28: RuntimeWarning: invalid value encountered in double_scalars
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:34: RuntimeWarning: invalid value encountered in double_scalars
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:39: RuntimeWarning: invalid value encountered in double_scalars
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:96: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:104: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:9: RuntimeWarning: invalid value encountered in double_scalars
  if __name__ == '__main__':
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:14: RuntimeWarning: invalid value encountered in double_scalars
  
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:20: RuntimeWarning: invalid value encountered in double_scalars
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:52: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:60: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:9: RuntimeWarning: invalid value encountered in double_scalars
  if __name__ == '__main__':
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:14: RuntimeWarning: invalid value encountered in double_scalars
  
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in double_scalars
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:53: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:61: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_a.head(50)
Unnamed: 0 Unnamed: 0.1 Unnamed: 0.1.1 Unnamed: 0.1.1.1 Unnamed: 0.1.1.1.1 Unnamed: 0.1.1.1.1.1 Unnamed: 0.1.1.1.1.1.1 Unnamed: 0.1.1.1.1.1.1.1 Unnamed: 0.1.1.1.1.1.1.1.1 level_0 ... returnpredvader returnpredsgd dailyaveragereturn dailyaveragereturnvader dailyaveragereturnsgd cumreturn1b cumreturn1d cumreturn1e cumreturndow cumreturnsp500
0 23 23 23 23 23 23 23 23 23 23 ... -3.147169 3.147169 1.001532 1.014097 0.988675 1.001532 1.014097 0.988675 0.975243 0.971729
1 112 112 112 112 112 112 112 112 112 112 ... -3.348837 3.348837 1.020781 1.002606 1.010641 1.022345 1.016740 0.999195 1.008729 1.003723
2 153 153 153 153 153 153 153 153 153 153 ... -5.204246 -5.204246 0.994020 0.995286 0.981924 1.016231 1.011947 0.981134 1.015801 1.007929
3 239 239 239 239 239 239 239 239 239 239 ... 0.929615 -0.929615 0.999709 0.991541 1.002966 1.015935 1.003387 0.984044 1.025649 1.018899
4 284 284 284 284 284 284 284 284 284 284 ... -0.929615 -0.929615 1.001278 0.990225 1.004563 1.017233 0.993579 0.988534 1.029852 1.022825
5 349 349 349 349 349 349 349 349 349 349 ... 12.653061 -12.653061 1.009986 1.005615 0.990158 1.027392 0.999158 0.978805 1.034506 1.028085
6 403 403 403 403 403 403 403 403 403 403 ... -0.319614 -0.319614 1.011374 1.008723 1.012217 1.039077 1.007874 0.990763 1.034354 1.027829
7 464 464 464 464 464 464 464 464 464 464 ... -0.690608 0.690608 1.003347 1.003949 1.004104 1.042555 1.011854 0.994829 1.028916 1.024141
8 532 532 532 532 532 532 532 532 532 532 ... 0.585652 -0.585652 1.035546 1.052613 1.002859 1.079613 1.065090 0.997673 1.039948 1.030812
9 617 617 617 617 617 617 617 617 617 617 ... 2.448730 2.448730 1.008794 1.010841 1.011141 1.089107 1.076637 1.008787 1.042258 1.036876
10 690 690 690 690 690 690 690 690 690 690 ... -0.699153 -0.699153 1.005840 1.010511 0.995994 1.095468 1.087953 1.004746 1.050171 1.043855
11 760 760 760 760 760 760 760 760 760 760 ... 3.989354 -3.989354 0.993924 1.008535 0.988490 1.088812 1.097239 0.993181 1.064015 1.058258
12 832 832 832 832 832 832 832 832 832 832 ... -3.862661 3.862661 0.997719 0.990693 0.997142 1.086328 1.087027 0.990343 1.051262 1.052659
13 915 915 915 915 915 915 915 915 915 915 ... 1.791656 1.791656 1.003994 0.997071 0.998687 1.090667 1.083843 0.989042 1.052709 1.051700
14 992 992 992 992 992 992 992 992 992 992 ... 0.193986 -0.193986 1.004965 0.976378 1.006797 1.096082 1.058241 0.995765 1.061645 1.059580
15 1052 1052 1052 1052 1052 1052 1052 1052 1052 1052 ... -2.180621 2.180621 1.009515 1.007491 0.990477 1.106511 1.066167 0.986283 1.053314 1.050628
16 1128 1128 1128 1128 1128 1128 1128 1128 1128 1128 ... -0.236616 -0.236616 1.000961 1.022304 1.016057 1.107574 1.089947 1.002120 1.051780 1.052845
17 1223 1223 1223 1223 1223 1223 1223 1223 1223 1223 ... -0.527778 0.527778 0.999108 0.996710 0.988060 1.106587 1.086361 0.990155 1.068135 1.071473
18 1348 1348 1348 1348 1348 1348 1348 1348 1348 1348 ... -1.090757 1.090757 0.995765 0.993474 1.001761 1.101900 1.079272 0.991898 1.077318 1.070822
19 1480 1480 1480 1480 1480 1480 1480 1480 1480 1480 ... -1.308060 1.308060 1.019066 0.998929 0.995950 1.122909 1.078115 0.987881 1.078286 1.073573
20 1549 1549 1549 1549 1549 1549 1549 1549 1549 1549 ... -1.421801 -1.421801 0.998408 0.999468 0.993334 1.121121 1.077541 0.981296 1.085593 1.081089
21 1614 1614 1614 1614 1614 1614 1614 1614 1614 1614 ... -0.600387 0.600387 0.996565 0.999695 1.003741 1.117270 1.077213 0.984967 1.090704 1.088463
22 1703 1703 1703 1703 1703 1703 1703 1703 1703 1703 ... 0.390259 0.390259 1.010500 1.006733 1.006129 1.129000 1.084466 0.991004 1.088278 1.087554
23 1798 1798 1798 1798 1798 1798 1798 1798 1798 1798 ... -0.101482 0.101482 1.006278 1.002703 1.006155 1.136088 1.087398 0.997104 1.078095 1.078098
24 1889 1889 1889 1889 1889 1889 1889 1889 1889 1889 ... 2.001191 2.001191 1.019696 1.006804 0.999160 1.158464 1.094796 0.996266 1.078824 1.075391
25 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ... 3.975535 3.975535 1.024688 1.026296 0.986022 1.187064 1.123585 0.982340 1.079589 1.073111
26 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ... -7.181572 7.181572 0.990992 0.988986 1.001652 1.176371 1.111210 0.983963 1.093505 1.089073
27 2069 2069 2069 2069 2069 2069 2069 2069 2069 2069 ... 0.216160 0.216160 0.992587 0.991376 0.987666 1.167650 1.101627 0.971828 1.096812 1.094106
28 2161 2161 2161 2161 2161 2161 2161 2161 2161 2161 ... -1.390728 -1.390728 1.010441 1.001136 1.004938 1.179841 1.102878 0.976627 1.093903 1.089657
29 2240 2240 2240 2240 2240 2240 2240 2240 2240 2240 ... -0.767712 0.767712 1.011594 0.996391 1.009238 1.193520 1.098898 0.985649 1.105804 1.108669
30 2320 2320 2320 2320 2320 2320 2320 2320 2320 2320 ... -14.599483 14.599483 0.998742 0.999037 0.997596 1.192019 1.097839 0.983279 1.109429 1.111718
31 2393 2393 2393 2393 2393 2393 2393 2393 2393 2393 ... -1.509872 -1.509872 0.992845 1.001974 1.002645 1.183490 1.100006 0.985880 1.105517 1.107272
32 2464 2464 2464 2464 2464 2464 2464 2464 2464 2464 ... 1.635323 1.635323 1.010289 1.003458 1.008498 1.195667 1.103810 0.994258 1.112604 1.115032
33 2527 2527 2527 2527 2527 2527 2527 2527 2527 2527 ... 0.116429 0.116429 0.997867 0.995095 0.998959 1.193116 1.098395 0.993223 1.113975 1.117608
34 2591 2591 2591 2591 2591 2591 2591 2591 2591 2591 ... 3.185596 3.185596 1.005740 1.003953 1.001200 1.199964 1.102737 0.994415 1.113094 1.116153
35 2692 2692 2692 2692 2692 2692 2692 2692 2692 2692 ... -0.125849 -0.125849 1.007319 0.997495 1.009109 1.208747 1.099975 1.003473 1.112489 1.113034
36 2774 2774 2774 2774 2774 2774 2774 2774 2774 2774 ... -4.263094 4.263094 1.005833 0.995293 0.985575 1.215797 1.094798 0.988998 1.109345 1.110072
37 2886 2886 2886 2886 2886 2886 2886 2886 2886 2886 ... 0.342727 -0.342727 1.018847 1.004337 1.023670 1.238711 1.099546 1.012407 1.116995 1.114797
38 2964 2964 2964 2964 2964 2964 2964 2964 2964 2964 ... -0.709939 -0.709939 1.019083 1.010996 1.015796 1.262350 1.111637 1.028399 1.112660 1.105945
39 3013 3013 3013 3013 3013 3013 3013 3013 3013 3013 ... -0.451904 -0.451904 0.995565 0.999405 0.999499 1.256751 1.110975 1.027885 1.111401 1.105387
40 3078 3078 3078 3078 3078 3078 3078 3078 3078 3078 ... 0.092807 0.092807 0.998999 0.995991 1.003584 1.255494 1.106521 1.031569 1.104150 1.099683
41 3140 3140 3140 3140 3140 3140 3140 3140 3140 3140 ... 10.352188 -10.352188 0.996598 1.011136 1.000641 1.251223 1.118843 1.032230 1.095178 1.091106
42 3205 3205 3205 3205 3205 3205 3205 3205 3205 3205 ... 1.081211 -1.081211 1.004553 1.005661 1.003162 1.256920 1.125177 1.035494 1.092844 1.090122
43 3252 3252 3252 3252 3252 3252 3252 3252 3252 3252 ... 0.551914 -0.551914 0.996361 0.993332 0.999931 1.252347 1.117674 1.035423 1.108871 1.098716
44 3338 3338 3338 3338 3338 3338 3338 3338 3338 3338 ... -0.418035 0.418035 1.015368 0.995008 1.001636 1.271593 1.112094 1.037117 1.112146 1.094594
45 3421 3421 3421 3421 3421 3421 3421 3421 3421 3421 ... 6.149846 -6.149846 1.001564 0.995438 0.989997 1.273582 1.107021 1.026743 1.119875 1.100943
46 3481 3481 3481 3481 3481 3481 3481 3481 3481 3481 ... 0.336409 -0.336409 1.000199 0.998797 1.003925 1.273836 1.105689 1.030772 1.118903 1.101245
47 3590 3590 3590 3590 3590 3590 3590 3590 3590 3590 ... 1.845763 1.845763 0.999817 1.000833 1.001073 1.273603 1.106610 1.031879 1.124481 1.107196
48 3646 3646 3646 3646 3646 3646 3646 3646 3646 3646 ... -1.300822 -1.300822 1.003085 0.997391 1.002216 1.277531 1.103723 1.034166 1.128648 1.109990
49 3728 3728 3728 3728 3728 3728 3728 3728 3728 3728 ... 1.770495 1.770495 1.003434 0.998701 0.993998 1.281918 1.102289 1.027958 1.128500 1.108846

50 rows × 78 columns